home *** CD-ROM | disk | FTP | other *** search
- 1
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
- EPISTAT
- Statistical Package
- for the IBM Personal Computer
-
- Version 3.0, 1984
-
-
-
-
-
- Written by:
-
- Tracy L. Gustafson, M.D.
-
- Copyright 1984
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
- 2
-
-
-
-
-
- INTRODUCTION
-
-
- EPISTAT is a collection of programs written in BASICA for
- statistical analysis of small to medium-sized data samples ( < 28
- samples or variables and < 2000 total data entries per file).
- The 25 programs in EPISTAT perform more than 40 common statistical
- tests or functions and provide utilities for data entry, editing,
- printing, graphing, sorting, selecting, transforming and crosstabs.
-
- The programs are intended to be as self-explanatory and user-
- friendly as possible. You do not need to memorize this guide
- before using the programs. On the other hand, neither the programs
- nor this manual purport to TEACH the proper use or interpretation
- of statistics. The user must have some familiarity with the kinds
- of data required and the underlying assumptions appropriate to each
- statistical test.
-
-
- For further explanations of tests, refer to:
-
- 1. Colton, Theodore. Statistics in Medicine. Little, Brown and Co.
- Boston, 1974.
- 2. Fleiss, Joseph. Statistical Methods for Rates and Proportions.
- John Wiley and Sons. New York, 1981.
- 3. Snedecor, George W. and Cochran, William G. Statistical Methods.
- Iowa State Univ. Press. Ames, Iowa, 1978.
- 4. Schlesselman, James. Case-Control Studies. Oxford Univ. Press.
- New York, 1982.
-
-
-
-
-
-
-
-
-
-
- CAVEAT:
- These programs have been tested extensively, but I cannot
- guarantee that they will work correctly with every possible data set.
- Incorrect results are usually due to errors in format or type of
- data entered. If you believe you have discovered an error in the
- programs, please write me. I intend to correct any bugs that are
- brought to my attention.
- It is good practice to regularly compare the results obtained
- by programs in EPISTAT with results obtained by your previous method
- of calculation. ANY unexpected result should be questionned and
- double-checked by reference to tables or another method of
- calculation.
-
-
-
-
-
-
-
-
-
- 3
-
-
-
-
-
-
-
- INDEX TO EPISTAT
-
- The following statistical tests and functions are available:
-
- TEST or FUNCTION PROGRAM NAME
- ---------------- ------------
- Analysis of variance (1 and 2-way)...................ANOVA
- Bayes' theorem.......................................BAYES
- Binomial distribution................................BINOMIAL
- Chi-square test and distribvtion.....................CHISQR
- Correlation coefficients.............................CORRELAT
- F distribution.......................................ANOVA
- Fisher's exact test..................................FISHERS
- Linear regression analysis...........................LNREGRES
- Mantel-Haenszel Chi-square test......................MHCHISQR
- Mantel-Haenszel for multiple controls................MHCHIMLT
- McNemar's test.......................................MCNEMAR
- Mean, median and standard deviation..................DATA-ONE
- Normal distribution..................................NORMAL
- Poisson distribution.................................POISSON
- Random sample generator..............................RANDOMIZ
- Rank sum test........................................RANKTEST
- Rates adjusted (direct and indirect).................RATEADJ
- Sample size calculations..........,..................SAMPLSIZ
- Signed rank test.....................................RANKTEST
- Student's T-test and T distribution..................T-TEST
-
-
-
-
-
-
- The following data-handling capabilities are provided:
-
- DATA MANIPULATION PROGRAM NAME
- ----------------- ------------
- Determine best test and program names................EPISTAT
- Graph histograms.....................................HISTOGRM
- Graph scattergrams...................................SCATRGRM
- Perform data transformations.........................LNREGRES
- Print data (sorted or input order)...................DATA-ONE
- Print crosstab reports...............................XTAB
- Select specific records..............................SELECT
- Transfer data between EPISTAT files..................FILETRAN
- Transfer data from FORTRAN to EPISTAT files..........FORTRANS
-
-
-
-
-
-
-
-
-
-
-
-
-
- 4
-
-
-
- SYSTEM REQUIREMENTS FOR EPISTAT
-
- MINIMUM OPTIMAL
- IBM PC with 64K RAM IBM PC with 96K RAM
- One 160K disk drive Two 320K disk drives
- Monochrome monitor Color graphics adapter
- BASICA Hi-res color monitor
- BASICA
- IBM, Epson, Okidata, or
- Prowriter printer with
- graphics capability
-
-
-
-
- OVERALL PROGRAM DESCRIPTION
-
-
- All calculations in EPISTAT are performed using single precision.
- Although it may first appear that double precision would be more
- appropriate for statistical tests, "double" precision makes little or
- no real improvement in precision in these programs. Many of the
- algorithms used to evaluate p values use trigonometric functions which
- are calculated in single precision anyway. For best results, data
- entries should be numbers between 1E+7 and 1E-7. Larger or smaller
- numbers should be multiplied by an appropriate power of 10 before
- entry and analysis in EPISTAT.
-
-
- All EPISTAT programs are written so that as much pertinent
- information about the test as possible can fit on the final screen.
- This feature allows a summary printed copy to be produced simply by
- pressing <Shift-PrtSc>. This will work any time there is a pause in
- the program display. Six programs, "DATA-ONE", "HISTOGRM",
- "RANDOMIZ", "SCATRGRM", "SELECT", and "XTAB" produce printed reports
- without using <Shift-PrtSc>. In these, follow program instructions
- to route output to your printer.
-
-
- EPISTAT is the introductory program in the EPISTAT package.
- DATA-ONE is the major data entry, editing, and printing program.
- Most of the programs in EPISTAT can evaluate data entered and saved
- using DATA-ONE. Many of the programs can, in addition, evaluate
- summary data. The programs marked with a star (*) below can
- evaluate data entered in DATA-ONE. Non-starred programs provide
- their own data entry routines.
-
-
-
- The EPISTAT disk should be placed in drive A (or other default
- drive) when loading any program because "EPIMRG" and "EPISETUP.DAT"
- are used by every program. Once a program is running, EPISTAT can
- be removed from drive A if necessary.
-
-
-
-
-
-
-
-
-
- 5
-
-
- INDIVIDUAL PROGRAM DESCRIPTIONS
-
-
- (1) "EPISTAT"
- This introductory program lists the available programs and aids
- the user in selecting the best statistical test. It also allows one
- to specify hardware configuration and colors for a color monitor.
- Choose colors 7,0,0 if you have a monochrome monitor connected to
- the color/graphics adapter. If yours is not one of the listed printers,
- check your printer's codes for the typeface you want. For example,
- the code for elite type on the Prowriter is ESC "E". If you press
- Escape then E, the display will show the decimal ASCII codes: 27 69.
- An alternate method is to press <Alt> and enter the decimal code on
- the numeric keypad. Press <Enter> when the complete code is entered.
-
- "DATA-ONE" *
-
- DATA ENTRY:
- This is the central keyboard data entry program for the EPISTAT
- package (for non-keyboard data entry, see FILETRAN and FORTRANS).
- Initial data entry (Option 1) first asks you to name your samples or
- variables. Then type in the data, pressing <Enter> twice after each
- entry. The maximum number of samples or variables (S) allowed is
- 28 with a color adapter and 7 with a monochrome adapter. The maximum
- number of records in each sample is 2000/S. A blank record can be
- entered by pressing <Enter> then key F2. To exit, press <Enter> then
- key F10. The mean, median and (n-1) standard deviation are then
- displayed. When you return to the main menu, SAVE your datafile to
- disk (Option 5) for future modification or use by other programs
- in the EPISTAT package.
- Although all entries in a datafile are treated as numbers by
- DATA-ONE, it is possible to enter characters (names) in a record.
- Characters will be treated as zeros in calculations. Nevertheless,
- it improves data readability to use the "Sample 1" column for record
- or case names. Thus, DATA-ONE allows one to specify a name for each
- column (variable) and each row (case) in the datafile.
-
- DATA MODIFICATION:
- APPEND (Option 2) allows one to add more observations to a sample
- at a later session. EDIT (Option 3) allows one to delete or replace
- incorrect data entries and to change sample or variable names. When
- you return to the main menu, SAVE modified data to disk again.
-
-
- PRINTING DATA:
- To view or review a datafile, a printout to screen or printer can
- be selected (Option 4). To print a datafile exactly as it was keyed in,
- request the printout in INPUT order. DATA-ONE can also print the
- data SORTED by any selected sample. Only numeric data is sorted by
- DATA-ONE, so it will not alphabetize a character field. Blank records
- are not sorted, either.
-
- SAVING DATAFILES and LOADING DATAFILES:
- SAVING data (Option 5), writes your data to disk in a sequential
- file for later editing, review, or use by another program. DATA MUST
- BE SAVED TO DISK before it can be used by other programs in EPISTAT.
- Since EPISTAT must be in drive A: (or other default drive) to begin,
- you will probably want to SAVE datafiles on drive B. To do so,
- precede each datafile name with B: (e.g. B:TESTDATA). Do not enclose
- filenames in quotation marks.
-
-
-
- 6
-
-
-
-
-
- (3) "ANOVA" *
-
- Provides ONE-way and TWO-way analysis of variance. One-way ANOVA
- compares the means of 3 or more samples. Two-way ANOVA compares the
- combined effects of 2 variables on a third (ROW and COLUMN effects).
- All samples in two-way ANOVA must have the same number of elements.
- ANOVA prints sample means, (n-1) variances and sums of squares.
- It also evaluates a known F value. (Snedecor, pp. 258-338)
-
- (4) "BAYES"
-
- Using Bayes' theorem, this program calculates the rates of false
- positive and false negative tests given different sensitivities,
- specificities and outcome incidences. Using the formula in a different
- way, it calculates the prior probability of several outcomes given a
- positive test. (Fleiss, p. 5)
-
- (5) "BINOMIAL"
-
- The binomial distribution allows calculation of the probability
- of an observed number compared to the expected. It assumes the variable
- is dichotomous and has an equal probability of occurring in each trial.
- This program calculates the ONE-tailed probability of the observed
- number and all more extreme situations. For example, in the case of
- 2 heads in 10 tosses of a coin, the ONE-tailed probability includes the
- sum of the probabilities for 0,1 and 2 heads. (Colton, p. 151)
-
- (6) "CHISQR"
-
- The Chi-square program evaluates a table of data or a known
- chi-square value. 2 by 2 tables are evaluated using Yates' correction
- and the odds ratio and its confidence limits are calculated using
- Cornfield's method (Schlesselman, p. 175,177). A Chi-square test
- for trend can also be performed. (Sclesselman, p. 201)
-
- (7) "CORRELAT" *
-
- Pearson's correlation coefficient and Spearman's rank correlation
- assess the relationship between paired variables. The probability
- of a given Pearson R value is evaluated using the T distribution.
- (Colton, p. 212)
-
- (8) "FILETRAN" *
-
- On occasion, it may happen that you want to compare 2 samples
- or variables that are in separate datafiles. Or you may have a data
- set with more than 28 variables that you split between two or more
- datafiles. Since EPISTAT programs only allow analysis of samples
- that are in the same datafile, FILETRAN allows you to transfer
- samples between two datafiles. You may create a new datafile by
- selecting one sample from DATAFILE #1 and another from DATAFILE #2.
- FILETRAN can also combine two samples by APPENDING one to the other.
-
-
-
-
-
-
-
-
- 7
-
-
- (9) "FISHERS"
-
- Fisher's exact test evaluates 2 by 2 tables of discrete variables.
- It is particularly valuable when the Chi-square test is inappropriate
- because the expected value for a cell is < 5. However, this program
- can evaluate some tables where A+B+C+D > 200.
-
- (10) "FORTRANS"
-
- If your data was previously entered into a FORTRAN or other SDF
- sequential card image file, FORTRANS may be able to transform it into
- an EPISTAT datafile. You must know the record length, appropriate
- column numbers, number of decimal places and missing value code.
-
- (11) "HISTOGRM" *
-
- The histogram program graphs a data sample according to user
- specifications on the high resolution graphics screen. To obtain
- a printed copy on the IBM, Epson, Okidata or Prowriter (specified in
- "EPISTAT") press key F1. Press F10 to return to the program.
-
- (12) "LNREGRES" *
-
- Linear regression analysis calculates the least-squares regression
- line for paired samples. It then uses the T distribution to determine
- if the calculated slope is significantly different than zero. (Colton
- p. 199) LNREGRES also provides a variety of data transformations.
- Transformed data can be saved to disk for future use or printout.
-
- (13) "MHCHISQR"
-
- The Mantel-Haenszel Chi-square test evaluates the relationship
- between two discrete variables while controlling for the effect of
- a third variable. It also calculates an odds ratio and 95% confidence
- limits. (Schlesselman, pp. 183,206)
-
- (14) "MHCHIMLT" *
-
- The Mantel-Haenszel Chi-square test for multiple controls compares
- a case sample with 2 or more matched control samples, and calculates
- a probability and an odds ratio. (Fleiss, p. 125) MHCHIMLT can
- evaluate summary data or raw data entered using DATA-ONE. If using
- DATA-ONE, data should be coded as "1" for factor present, and "0" for
- factor absent in each case and control sample.
-
- (15) "MCNEMAR"
-
- McNemar's test (paired Chi-square test) evaluates 2 by 2 tables
- of paired discrete variables using Yates' correction and calculates
- an odds ratio and 95% confidence limits. (Schlesselman, p. 210)
-
- (16) "NORMAL" *
-
- The normal distribution has innumerable uses in statistics. This
- program specifically addresses three situations: (1) It compares
- a sample mean to a population mean. (2) It calculates the proportion
- of samples that would be expected to fall in any given range under
- the normal curve. (3) It calculates the two-tailed probability
- associated with any given value of z.
-
-
-
-
- 8
-
-
- (17) "POISSON"
-
- The Poisson distribution applies to dichotomous variables when
- the number of successes can be counted, but the number of failures
- cannot. This program calculates a ONE-tailed probability.
-
- (18) "RANDOMIZ"
-
- This random sample generator aids in the selection of random
- samples for several purposes. It can provide a random subset of a
- larger population, or it can assign cases randomly to independent or
- paired groups for case-control studies.
-
- (19) "RANKTEST" *
-
- Two non-parametric tests of significance are performed by this
- program. They are appropriate for small samples which are clearly NOT
- normally distributed. They also specifically apply when quantitative
- variables are not available but qualitative ranks are. The RANK SUM
- TEST compares 2 independent samples. The SIGNED RANK TEST compares the
- medians of paired samples. RANKTEST calculates the TWO-tailed
- exact probability associated with the various rank sums. Note that
- for samples larger that 20 observations, the latter calculation can
- take several minutes. (Colton, pp. 219-222)
-
-
- (20) "RATEADJ" *
-
- The rate adjustment program will adjust sample rates by either
- the direct or indirect methods.(Colton, pp. 47-51) For the direct
- method, the datafile must include the study sample rates and the
- standard population figures. For indirect adjustment, the datafile
- used must include the study population figures and the standard
- population rates. For indirect rate adjustment, RATEADJ evaluates
- the probability of the observed number of cases using the ONE-tailed
- Poisson distribution for small numbers, or the Chi-square
- distribution for large numbers.
-
- (21) "SAMPLSIZ"
-
- The sample size program calculates the approximate sample sizes
- required to achieve statistical significance given certain specified
- levels of certainty. Adjustments are made if the user desires more
- than one control per case. (Schlesselman, p. 168)
-
- For a survey: TP = total population pi = population proportion
- d = maximum acceptable error in sample proportion
-
- n = [ z(a)*SQR(pi*(1-pi)) / d ] squared
- N = n / (1+n/TP)
-
- For a paired case-control study: (Colton, p. 161)
-
- N = [(z(a)*SQR(pi*(1-pi)) + |z(b)|*SQR(PT*(1-PT))) / (PT-pi)] squared
-
- For an unpaired case-control study: (Fleiss, p. 41)
-
- [(z(a)*SQR(2*pi*(1-pi)) + |z(b)|*SQR(PT*(1-PT)+PC*(1-PC))]
- N = [-----------------------------------------------------------] squared
- (PT - PC)
-
-
-
- 9
-
-
-
-
- (22) "SCATRGRM" *
-
- The scattergram program graphs paired variables according to
- user specifications on the hi-res graphics screen. To add the linear
- regression line, press key F5. To obtain a printed copy on the IBM,
- Epson, Okidata or Prowriter (specified in "EPISTAT"), press key F1.
- Press key F10 to return to the program.
-
- (23) "SELECT" *
-
- This program allows the user to select any combination of
- records for printout. It can also create a new disk datafile that
- is a select subset of the original. One can select on any variable
- with "AND" and "OR" specifications. As many as 10 selection criteria
- can be set at one time. SELECT assumes that "AND"s are in parentheses.
- For example:
- "SELECT IF Sample #1>10 AND Sample #2=1 OR Sample #1<Sample #3"
- is interpreted as meaning:
- "SELECT IF (Sample #1>10 AND Sample #2=1) OR Sample #1<Sample #3"
-
- (24) "T-TEST" *
-
- The Student's T-test compares the means of two samples. The
- program provides both paired and unpaired T-test calculations.
- Variances (n-1) are displayed and, for independent samples, the
- equality of variances is tested to be sure that the assumptions
- of the T-test are met.(Snedecor, p. 116) T-TEST will also
- evaluate a known T value.
-
- (25) "XTAB" *
-
- The crosstab program generates 1,2 or 3-way crosstab reports.
- It allows the user to specify the crosstab criteria as well as a name
- for each row and column so that the report will be readable and
- easily interpreted.
-
-
-
-
-
-
- NOTICE
-
- ---------------------------------------------------------------------
- Users may copy EPISTAT and distribute it to others on the following
- conditions:
- 1. The programs are not modified in any way.
- 2. Individual programs are not distributed separately.
- 3. No fee is charged for copying or distribution.
- ---------------------------------------------------------------------
-
-
-
-
-
-
-
-
-
-
-
- 10
-
-
-
- ====USER-SUPPORTED SOFTWARE====
-
- The concept of user-supported software is based on three
- principles:
-
- 1. The value and utility of a software package is best assessed
- by each user on his or her own system with his or her own data.
- Only after using a program can one determine whether it serves
- one's personal applications, needs, and tastes.
-
- 2. The creation of independent personal computer software requires
- a substantial commitment of time and effort. Rather than
- duplicate this effort time after time, the computing community
- can and should support individual creative efforts.
-
- 3. By encouraging users to copy programs, rather than spending
- large sums on copy-protection, authors can supply quality
- software at reduced cost. Users will support useful programs.
-
-
- If after using EPISTAT, you find it of value, your contribution
- in any amount will be appreciated ( $25 suggested ).
-
- Send contributions to:
-
- Tracy L. Gustafson, M.D.
- 1705 Gattis School Road
- Round Rock, Texas 78664
-
-
-
- Thank you, and good luck.